The use of sense in unsupervised training of acoustic models for ASR systems

نویسندگان

  • Rita Singh
  • Benjamin Lambert
  • Bhiksha Raj
چکیده

In unsupervised training of ASR systems, no annotated data are assumed to exist. Word-level annotations for training audio are generated iteratively using an ASR system. At each iteration a subset of data judged as having the most reliable transcriptions is selected to train the next set of acoustic models. Data selection however remains a difficult problem, particularly when the error rate of the recognizer providing the initial annotation is very high. In this paper we propose an iterative algorithm that uses a combination of likelihoods and a simple model of sense to select data. We show that the algorithm is effective for unsupervised training of acoustic models, particularly when the initial annotation is highly erroneous. Experiments conducted on Fisher-1 data using initial models from Switchboard, and a vocabulary and LM derived from the Google N-grams, show that performance on a selected held-out test set from Fisher data improves more with iterations relative to likelihood-based data selection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised acoustic model training using multiple seed ASR systems

Unsupervised acoustic modeling can offer a cost and time effective way of creating a solid acoustic model for any under-resourced language. This paper explores the novel idea of using two independent ASR systems to transcribe new speech data, align and filter the ASR hypotheses and use the presumably correct transcriptions to iteratively improve the two seed ASR systems. In parallel, the newly ...

متن کامل

Réduction des coûts de développement de systèmes de reconnaissance de la parole à grand vocabulaire. (Reducing development costs of large vocabulary speech recognition systems)

One of the outstanding challenges in large vocabulary automatic speech recognition (ASR) is the reduction of development costs required to build a new recognition system or adapt an existing one to a new task, language or dialect. The state-of-the-art ASR systems are based on the principles of the statistical learning paradigm, using information provided by two stochastic models, an acoustic (A...

متن کامل

Unsupervised Testing Strategies for ASR

This paper describes unsupervised strategies for estimating relative accuracy differences between acoustic models or language models used for automatic speech recognition. To test acoustic models, the approach extends ideas used for unsupervised discriminative training to include a more explicit validation on held out data. To test language models, we use a dual interpretation of the same proce...

متن کامل

Speech alignment and recognition experiments for Luxembourgish

Luxembourgish, embedded in a multilingual context on the divide between Romance and Germanic cultures, remains one of Europe’s under-described languages. In this paper, we propose to study acoustic similarities between Luxembourgish and major contact languages (German, French, English) with the help of automatic speech alignment and recognition systems. Experiments were run using monolingual ac...

متن کامل

Two-pass decision tree construction for unsupervised adaptation of HMM-based synthesis models

Hidden Markov model (HMM) -based speech synthesis systems possess several advantages over concatenative synthesis systems. One such advantage is the relative ease with which HMM-based systems are adapted to speakers not present in the training dataset. Speaker adaptation methods used in the field of HMM-based automatic speech recognition (ASR) are adopted for this task. In the case of unsupervi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010